Overview

Dataset Statistics

Number of Variables 12
Number of Rows 891
Missing Cells 177
Missing Cells (%) 1.7%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 333.9 KB
Average Row Size in Memory 383.7 B
Variable Types
  • Numerical: 7
  • Categorical: 5

Dataset Insights

PassengerId is uniformly distributed Uniform
Age has 177 (19.87%) missing values Missing
Survived is skewed Skewed
Pclass is skewed Skewed
SibSp is skewed Skewed
Parch is skewed Skewed
Fare is skewed Skewed
Age has 177 (19.87%) infinite values Infinity
Name has a high cardinality: 891 distinct values High Cardinality
Ticket has a high cardinality: 681 distinct values High Cardinality

Dataset Insights

Cabin has a high cardinality: 148 distinct values High Cardinality
Name has all distinct values Unique
Survived has 549 (61.62%) zeros Zeros
SibSp has 608 (68.24%) zeros Zeros
Parch has 678 (76.09%) zeros Zeros
  • 1
  • 2

Variables

PassengerId

numerical

Approximate Distinct Count 891
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 13.9 KB
Mean 446
Minimum 1
Maximum 891
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PassengerId is uniformly distributed

Quantile Statistics

Minimum 1
5-th Percentile 45.5
Q1 223.5
Median 446
Q3 668.5
95-th Percentile 846.5
Maximum 891
Range 890
IQR 445

Descriptive Statistics

Mean 446
Standard Deviation 257.3538
Variance 66231
Sum 397386
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.577
  • PassengerId is not normally distributed (p-value 7.259388077973426e-05)

Survived

numerical

Approximate Distinct Count 2
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 13.9 KB
Mean 0.3838
Minimum 0
Maximum 1
Zeros 549
Zeros (%) 61.6%
Negatives 0
Negatives (%) 0.0%
  • Survived is skewed right (γ1 = 0.4777)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 1
95-th Percentile 1
Maximum 1
Range 1
IQR 1

Descriptive Statistics

Mean 0.3838
Standard Deviation 0.4866
Variance 0.2368
Sum 342
Skewness 0.4777
Kurtosis -1.7718
Coefficient of Variation 1.2677
  • Survived is not normally distributed (p-value 3.6497285278016304e-20)

Pclass

numerical

Approximate Distinct Count 3
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 13.9 KB
Mean 2.3086
Minimum 1
Maximum 3
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Pclass is skewed left (γ1 = -0.6295)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 2
Median 3
Q3 3
95-th Percentile 3
Maximum 3
Range 2
IQR 1

Descriptive Statistics

Mean 2.3086
Standard Deviation 0.8361
Variance 0.699
Sum 2057
Skewness -0.6295
Kurtosis -1.2796
Coefficient of Variation 0.3621
  • Pclass is not normally distributed (p-value 7.13662382541816e-20)

Name

categorical

Approximate Distinct Count 891
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 80.0 KB

Length

Mean 26.9652
Standard Deviation 9.2816
Median 25
Minimum 12
Maximum 82

Sample

1st row Braund, Mr. Owen H...
2nd row Cumings, Mrs. John...
3rd row Heikkinen, Miss. L...
4th row Futrelle, Mrs. Jac...
5th row Allen, Mr. William...

Letter

Count 19091
Lowercase Letter 15446
Space Separator 2735
Uppercase Letter 3645
Dash Punctuation 13
Decimal Number 0
  • Name contains many words: 1522 words
  • The largest value (mr) is over 2.86 times larger than the second largest value (miss)

Sex

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory Size 60.7 KB
  • The largest value (male) is over 1.84 times larger than the second largest value (female)

Length

Mean 4.7048
Standard Deviation 0.956
Median 4
Minimum 4
Maximum 6

Sample

1st row male
2nd row female
3rd row female
4th row female
5th row male

Letter

Count 4192
Lowercase Letter 4192
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (male, female) take over 50.0%
  • The largest value (male) is over 1.84 times larger than the second largest value (female)

Age

numerical

Approximate Distinct Count 88
Approximate Unique (%) 12.3%
Missing 177
Missing (%) 19.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 11.2 KB
Mean 29.6991
Minimum 0.42
Maximum 80
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age is skewed right (γ1 = 0.3883)

Quantile Statistics

Minimum 0.42
5-th Percentile 4
Q1 20.125
Median 28
Q3 38
95-th Percentile 56
Maximum 80
Range 79.58
IQR 17.875

Descriptive Statistics

Mean 29.6991
Standard Deviation 14.5265
Variance 211.0191
Sum 21205.17
Skewness 0.3883
Kurtosis 0.1686
Coefficient of Variation 0.4891
  • Age has 11 outliers

SibSp

numerical

Approximate Distinct Count 7
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 13.9 KB
Mean 0.523
Minimum 0
Maximum 8
Zeros 608
Zeros (%) 68.2%
Negatives 0
Negatives (%) 0.0%
  • SibSp is skewed right (γ1 = 3.6891)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 1
95-th Percentile 3
Maximum 8
Range 8
IQR 1

Descriptive Statistics

Mean 0.523
Standard Deviation 1.1027
Variance 1.216
Sum 466
Skewness 3.6891
Kurtosis 17.7735
Coefficient of Variation 2.1085
  • SibSp is not normally distributed (p-value 7.45316281069286e-23)
  • SibSp has 46 outliers

Parch

numerical

Approximate Distinct Count 7
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 13.9 KB
Mean 0.3816
Minimum 0
Maximum 6
Zeros 678
Zeros (%) 76.1%
Negatives 0
Negatives (%) 0.0%
  • Parch is skewed right (γ1 = 2.7445)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 0
95-th Percentile 2
Maximum 6
Range 6
IQR 0

Descriptive Statistics

Mean 0.3816
Standard Deviation 0.8061
Variance 0.6497
Sum 340
Skewness 2.7445
Kurtosis 9.7166
Coefficient of Variation 2.1123
  • Parch is not normally distributed (p-value 3.726396381311177e-24)
  • Parch has 213 outliers

Ticket

categorical

Approximate Distinct Count 681
Approximate Unique (%) 76.4%
Missing 0
Missing (%) 0.0%
Memory Size 62.4 KB

Length

Mean 6.7508
Standard Deviation 2.7455
Median 6
Minimum 3
Maximum 18

Sample

1st row A/5 21171
2nd row PC 17599
3rd row STON/O2. 3101282
4th row 113803
5th row 373450

Letter

Count 673
Lowercase Letter 21
Space Separator 239
Uppercase Letter 652
Dash Punctuation 0
Decimal Number 4808

Fare

numerical

Approximate Distinct Count 248
Approximate Unique (%) 27.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 13.9 KB
Mean 32.2042
Minimum 0
Maximum 512.3292
Zeros 15
Zeros (%) 1.7%
Negatives 0
Negatives (%) 0.0%
  • Fare is skewed right (γ1 = 4.7793)

Quantile Statistics

Minimum 0
5-th Percentile 7.225
Q1 7.9104
Median 14.4542
Q3 31
95-th Percentile 112.0791
Maximum 512.3292
Range 512.3292
IQR 23.0896

Descriptive Statistics

Mean 32.2042
Standard Deviation 49.6934
Variance 2469.4368
Sum 28693.9493
Skewness 4.7793
Kurtosis 33.2043
Coefficient of Variation 1.5431
  • Fare is not normally distributed (p-value 5.925743764895219e-18)
  • Fare has 116 outliers

Cabin

categorical

Approximate Distinct Count 148
Approximate Unique (%) 16.6%
Missing 0
Missing (%) 0.0%
Memory Size 59.3 KB
  • The largest value (nan) is over 171.75 times larger than the second largest value (B96 B98)

Length

Mean 3.1347
Standard Deviation 1.021
Median 3
Minimum 1
Maximum 15

Sample

1st row nan
2nd row C85
3rd row nan
4th row C123
5th row nan

Letter

Count 2299
Lowercase Letter 2061
Space Separator 34
Uppercase Letter 238
Dash Punctuation 0
Decimal Number 460
  • The top 2 categories (nan, B96 B98) take over 50.0%
  • The largest value (nan) is over 171.75 times larger than the second largest value (b98)

Embarked

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 57.4 KB
  • The largest value (S) is over 3.83 times larger than the second largest value (C)

Length

Mean 1.0045
Standard Deviation 0.0947
Median 1
Minimum 1
Maximum 3

Sample

1st row S
2nd row C
3rd row S
4th row S
5th row S

Letter

Count 895
Lowercase Letter 6
Space Separator 0
Uppercase Letter 889
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (S, C) take over 50.0%
  • The largest value (s) is over 3.83 times larger than the second largest value (c)

Interactions

Correlations

Missing Values